We will be using the Jupyter notebook for many activities this module. Every notebook has an associated language called the "kernel". We will be using in the Python 3 kernel from the IPython project.
Python is a programming language that has been under development for over 25 years [1] (Python was conceived in the late 1980s by Guido van Rossum).
His goal was to create a simple and intuitive computer language that is powerful and uses a computer syntax as understandable as English text.
Python is a general-purpose language, which means you can use it in any field to:
It is available on all computers, tablets and smartphones. Python is an interpreted language. This means that you can write Python commands and the computer can execute these instructions directly (using a Python interpreter)
The biggest companies use Python: Google, Facebook, Wikipedia, EDF, Airbus ...
Python is now one of the most widely used programming languages in the world: see for example
This Chapter will not cover everything in Python. If you would like, please consider the following resources:
Getting Started with Python:
Learning Python in Notebooks:
This is handy to always have available for reference:
Python Reference:
Python is an imperative language based on statements. That is, programs in Python consists of lines composed of statements. A statement can be:
https://docs.python.org/3/tutorial/introduction.html#using-python-as-a-calculator
Comments in Python start with the hash character, #
# this is the first comment
spam = 1 # and this is the second comment
# ... and now a third!
text = "# This is not a comment because it's inside quotes."
operators +, -, * and / work just like in most other languages (for example, Pascal or C); parentheses () can be used for grouping
1
2
-3
3.14
3.14
5 ** 2 # 5 squared, or 5 to the power of 2b
25
50 - 5 * 6
20
5 * 3 + 2 # floored quotient * divisor + remainder
17
(50 - 5 * 6) / 4
5.0
17 % 3 # the % operator returns the remainder of the division
2
17 // 3 # floor division
5
width = 20 # The equal sign (=) is used to assign a value to a variable.
height = 5 * 9 # Afterwards, no result is displayed before the next interactive prompt
width * height
900
tax = 12.5 / 100
price = 100.50
price * tax
12.5625
price + _ # In interactive mode, the last printed expression is assigned to the variable `_`
113.0625
'apple' # single quotes
'apple'
'doesn\'t' # use \' to escape the single quote...
"doesn't"
"apple" # ...or use double quotes instead
'apple'
print('"Isn\'t," they said.')
"Isn't," they said.
first_str = 'First line.\nSecond line.' # \n means newline
first_str
'First line.\nSecond line.'
print(first_str)
First line. Second line.
print('C:\some\name') # here \n means newline!
print(r'C:\some\name') # note the r before the quote
C:\some ame C:\some\name
String literals can span multiple lines. One way is using triple-quotes: """...""" or '''...'''. End of lines are automatically included in the string, but it’s possible to prevent this by adding a \ at the end of the line. The following example:
print("""\
Usage: thingy [OPTIONS]
-h Display this usage message
-H hostname Hostname to connect to
""")
Usage: thingy [OPTIONS]
-h Display this usage message
-H hostname Hostname to connect to
Strings can be concatenated (glued together) with the + operator, and repeated with *:
# 3 times 'un', followed by 'ium'
3 * 'un' + 'ium'
'unununium'
Two or more string literals (i.e. the ones enclosed between quotes) next to each other are automatically concatenated.
'Py' 'thon'
'Python'
This only works with two literals though, not with variables or expressions:
prefix = 'Py'
prefix 'thon' # can't concatenate a variable and a string literal
File "<ipython-input-51-20527b549823>", line 2 prefix 'thon' # can't concatenate a variable and a string literal ^ SyntaxError: invalid syntax
If you want to concatenate variables or a variable and a literal, use +:
prefix + 'thon'
'Python'
This feature is particularly useful when you want to break long strings:
text = ('Put several strings within parentheses '
'to have them joined together.')
text
'Put several strings within parentheses to have them joined together.'
Strings can be indexed (subscripted), with the first character having index 0. There is no separate character type; a character is simply a string of size one
word = 'Python'
word[0] # character in position 0
'P'
word[5] # character in position 5
'n'
word[-1] # last character
'n'
word[-2] # second-last character
'o'
Note that since -0 is the same as 0, negative indices start from -1
In addition to indexing, slicing is also supported. While indexing is used to obtain individual characters, slicing allows you to obtain substring:
word[0:2] # characters from position 0 (included) to 2 (excluded)
'Py'
word[2:5] # characters from position 2 (included) to 5 (excluded)
'tho'
word[:2] # character from the beginning to position 2 (excluded)
'on'
word[4:] # characters from position 4 (included) to the end
'on'
word[-2:] # characters from the second-last (included) to the end
'on'
One way to remember how slices work is to think of the indices as pointing between characters, with the left edge of the first character numbered 0. Then the right edge of the last character of a string of n characters has index n, for example:
The first row of numbers gives the position of the indices 0…6 in the string; the second row gives the corresponding negative indices. The slice from i to j consists of all characters between the edges labeled i and j, respectively.
For non-negative indices, the length of a slice is the difference of the indices, if both are within bounds. For example, the length of word[1:3] is 2.
Attempting to use an index that is too large will result in an error:
word[42] # the word only has 6 characters
--------------------------------------------------------------------------- IndexError Traceback (most recent call last) <ipython-input-62-469c6d99b5b2> in <module> ----> 1 word[42] # the word only has 6 characters IndexError: string index out of range
However, out of range slice indexes are handled gracefully when used for slicing:
word[4:42]
'on'
Python strings cannot be changed — they are immutable. Therefore, assigning to an indexed position in the string results in an error:
word[0] = 'J'
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-64-91a956888ca7> in <module> ----> 1 word[0] = 'J' TypeError: 'str' object does not support item assignment
The built-in function len() returns the length of a string:
True
False
Python has three very useful data structures built into the language:
dict: {key: value, ...}. Dictionary is mutable. But Keys are not duplicatedlist: []. List is mutable i.e we can make any changes in list.tuple: (item, ...). Tuple is immutable i.e we can not make any changes in tupleset: {}. Set is mutable i.e we can make any changes in set. But elements are not duplicated.see this example for more details
# list
[1, 2, 3]
squares = [1, 4, 9, 16, 25]
squares
[1, 4, 9, 16, 25]
squares[1]
4
squares + [36, 49, 64, 81, 100]
[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]
# replace some values
squares[2:4] = [36, 49, 81]
squares
[1, 4, 36, 49, 81, 81, 25]
# clear the list by replacing all the elements with an empty list
squares = [] # or squares[:] = []
squares
[]
letters = ['a', 'b', 'c', 'd', 'e', 'f', 'g']
len(letters)
7
alpha_numeric = [1, 'b', 'c', 23, 'e', -6, 'g']
alpha_numeric
[1, 'b', 'c', 23, 'e', -6, 'g']
#tuple
(1, 2, 3)
(1, 2, 3)
# tuple
1, 2, 3
(1, 2, 3)
# set
{1, 2, 3}
{1, 2, 3}
# dict: data structure which stores key value pairs
{"apple": "a fruit", "banana": "an herb", "monkey": "a mammal"}
{'apple': 'a fruit', 'banana': 'an herb', 'monkey': 'a mammal'}
{"apple": "a fruit", "banana": "an herb", "monkey": "a mammal"}["apple"]
There are two ways to call functions in Python:
Infix operator name:
1 + 2
abs(-1)
import operator
operator.add(1, 2)
Evaluating and display result as an Out, versus evaluating and printing result (side-effect).
print(1)
# Python 3: Simple output (with Unicode)
print("Hello, I'm Python!")
name = input('What is your name?\n')
print('Hi, %s.' % name)
Hi, Pape.
None
What happened? All functions return something, even if you don't specify it. If you don't specify a return value, then it will default to returning None.
For more detail, see defining functions
def plus(a, b):
return a + b
plus(3, 4)
def plus(a, b):
a + b
plus(3, 4)
# Fibonacci series up to n
def fib(n):
a, b = 0, 1
while a < n:
print(a, end=' ')
a, b = b, a+b
print()
fib(1000)
0 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987
def ask_ok(prompt, retries=4, reminder='Please try again!'):
while True:
ok = input(prompt)
if ok in ('y', 'ye', 'yes'):
return True
if ok in ('n', 'no', 'nop', 'nope'):
return False
retries = retries - 1
if retries < 0:
raise ValueError('invalid user response')
print(reminder)
ask_ok('Do you really want to quit?')
Please try again!
True
ask_ok('OK to overwrite the file?', 2, 'Come on, only')
False
Classes provide a means of bundling data and functionality together. Each class instance can have attributes attached to it for maintaining its state. Class instances can also have methods (defined by its class) for modifying its state.
Class objects support two kinds of operations: attribute references and instantiation.
Attribute references use the standard syntax used for all attribute references in Python: obj.name. Valid attribute names are all the names that were in the class’s namespace when the class object was created. So, if the class definition looked like this:
class MyClass:
"""A simple example class"""
i = 12345
def f(self):
return 'hello world'
then MyClass.i and MyClass.f are valid attribute references, returning an integer and a function object, respectively.
__doc__ is also a valid attribute, returning the docstring belonging to the class: "A simple example class".
# creates a new instance of the class and assigns this object to the local variable x.
x = MyClass()
When a class defines an init() method, class instantiation automatically invokes init() for the newly-created class instance.
Of course, the init() method may have arguments for greater flexibility. In that case, arguments given to the class instantiation operator are passed on to init(). For example,
class Complex:
def __init__(self, realpart, imagpart):
self.r = realpart
self.i = imagpart
x = Complex(3.0, -4.5)
x.r, x.i
(3.0, -4.5)
class Dog:
kind = 'canine' # class variable shared by all instances
def __init__(self, name):
self.name = name # instance variable unique to each instance
d = Dog('Fido')
e = Dog('Buddy')
print(d.kind) # shared by all dogs
print(e.kind) # shared by all dogs
print(d.name) # unique to d
print(e.name) # unique to e
canine canine Fido Buddy
class Foo:
def __init__(self, first_name, last_name):
self.first_name = first_name
self.last_name = last_name
def my_name(self):
return f"My name is: {self.first_name} {self.last_name.upper()}"
Foo(first_name="Jean Paul", last_name="Seck").my_name()
'My name is: Jean Paul SECK'
# Methods may call other methods by using method attributes of the self argument:
class Bag:
def __init__(self, x):
self.x = x
def add(self, y):
return self.x + y
def addtwice(self, y):
return self.add(y)
Bag(3).addtwice(2)
5
The Python interpreter has a number of functions and types built into it that are always available. They are listed here in the following link: built-in
abs(-123)
123
len("my name is")
10
float('+1.23')
1.23
isinstance(223, int)
True
max([2, -23, 12, 1239]) # largest item in an iterable
1239
pow(2, 3)
8
sorted([2, -23, 12, 1239], reverse=False) # a new sorted list from the items in iterable.
[-23, 2, 12, 1239]
sum([2, -23, 12, 1239])
1230
Standard packages -examples:
scipy extends the possibilities offered by NumPy, in particular by proposing algorithms commonly used in scientific computing. It provides algorithms for optimization, integration, interpolation, eigenvalue problems, algebraic equations, differential equations, statistics and many other classes of problems.
matplotlib allows you to create graphs (static, animated, and interactive) in Python
import numpy as np
a = [1, 2, 3]
np.array(a)
array([1, 2, 3])
np.arange(10)
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
np.arange(10, 0, -1)
array([10, 9, 8, 7, 6, 5, 4, 3, 2, 1])
# array & dimensions
w = np.array([[1, 2], [3, 4], [5, 6]])
w
array([[1, 2],
[3, 4],
[5, 6]])
w.ndim
2

w.shape
(3, 2)
w.size
6
a = np.arange(0, 6)
>>> a.shape
(6,)
b = a.reshape((2, 3))
print(b)
b.shape
[[0 1 2] [3 4 5]]
(2, 3)
np.transpose(b) # b.T
array([[0, 3],
[1, 4],
[2, 5]])
np.dot(b, b.T) # matrix product
array([[ 5, 14],
[14, 50]])
a = np.diag((1, 2, 3)) # diagonal matrix
a
array([[1, 0, 0],
[0, 2, 0],
[0, 0, 3]])
np.linalg.det(a)
6.0
a1 = np.ones((3, 3), int) # float, str
np.zeros((2,3), int)
array([[0, 0, 0],
[0, 0, 0]])
import matplotlib.pyplot as plt
temps = [1, 2, 3, 4, 6, 7, 9]
concentration = [5.5, 7.2, 11.8, 13.6, 19.1, 21.7, 29.4]
plt.scatter(temps, concentration, marker="o", color="blue")
plt.xlabel("Times (h)")
plt.ylabel("Concentration (mg/L)")
plt.title("Product concentration x Times")
plt.show()
plt.plot(temps, concentration, color='green', ls="--")
plt.grid()
plt.savefig("concentration_vs_temps.png", bbox_inches='tight', dpi=200)
sequence = "ACGATCATAGCGAGCTACGTAGAA"
bases = ["A", "C", "G", "T"]
distribution = []
for base in bases:
distribution.append(sequence.count(base))
x = np.arange(len(bases))
plt.bar(x, distribution)
plt.xticks(x, bases)
plt.xlabel("Basis")
plt.ylabel("Number")
plt.title(f"Distribution of basis in: {sequence}")
plt.savefig("distribution_basis.png", bbox_inches="tight", dpi=200)
import pandas as pd
# series: corresponds to a one-dimensional vector
s = pd.Series([10, 20, 30, 40], index = ['a', 'b', 'c', 'd'])
s
a 10 b 20 c 30 d 40 dtype: int64
s.shape
(5,)
s.head(2)
a 10 b 20 dtype: int64
s.tail(2)
d 40 z 50 dtype: int64
s[0], s["a"]
(10, 10)
s[[1, 3]]
b 20 d 40 dtype: int64
s["c"] = 300
s["z"] = 50
s
a 10 b 20 c 300 d 40 z 50 dtype: int64
# dataframe: correspond to two-dimensional tables with labels to name the rows and columns.
df = pd.DataFrame(columns=["a", "b", "c", "d",],
index=["chat", "singe", "souris"],
data=[np.arange(10, 14),
np.arange(20, 24),
np.arange(30, 34),
])
df
| a | b | c | d | |
|---|---|---|---|---|
| chat | 10 | 11 | 12 | 13 |
| singe | 20 | 21 | 22 | 23 |
| souris | 30 | 31 | 32 | 33 |
df.shape
(3, 4)
df.columns
Index(['a', 'b', 'c', 'd'], dtype='object')
df.info()
<class 'pandas.core.frame.DataFrame'> Index: 3 entries, chat to souris Data columns (total 4 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 a 3 non-null int64 1 b 3 non-null int64 2 c 3 non-null int64 3 d 3 non-null int64 dtypes: int64(4) memory usage: 120.0+ bytes
df.describe()
| a | b | c | d | |
|---|---|---|---|---|
| count | 3.0 | 3.0 | 3.0 | 3.0 |
| mean | 20.0 | 21.0 | 22.0 | 23.0 |
| std | 10.0 | 10.0 | 10.0 | 10.0 |
| min | 10.0 | 11.0 | 12.0 | 13.0 |
| 25% | 15.0 | 16.0 | 17.0 | 18.0 |
| 50% | 20.0 | 21.0 | 22.0 | 23.0 |
| 75% | 25.0 | 26.0 | 27.0 | 28.0 |
| max | 30.0 | 31.0 | 32.0 | 33.0 |
df.columns = ["Paris", "Lyon", "Nantes", "Pau"]
df.columns
Index(['Paris', 'Lyon', 'Nantes', 'Pau'], dtype='object')
df.head(2)
| Paris | Lyon | Nantes | Pau | |
|---|---|---|---|---|
| chat | 10 | 11 | 12 | 13 |
| singe | 20 | 21 | 22 | 23 |
df["gender"] = ["f", None, "m"]
df
| Paris | Lyon | Nantes | Pau | gender | |
|---|---|---|---|---|---|
| chat | 10 | 11 | 12 | 13 | f |
| singe | 20 | 21 | 22 | 23 | None |
| souris | 30 | 31 | 32 | 33 | m |
df.isnull().mean()
Paris 0.000000 Lyon 0.000000 Nantes 0.000000 Pau 0.000000 gender 0.333333 dtype: float64
import missingno as msno
msno.bar(df, figsize=(5, 5), fontsize=12, color="dodgerblue");
msno.matrix(df);
df["Lyon"]
chat 11 singe 21 souris 31 Name: Lyon, dtype: int64
df[["Lyon", "Pau"]]
| Lyon | Pau | |
|---|---|---|
| chat | 11 | 13 |
| singe | 21 | 23 |
| souris | 31 | 33 |
# row selection: use .loc
df.loc["singe"]
Paris 20 Lyon 21 Nantes 22 Pau 23 Name: singe, dtype: int64
df.loc[["singe", "chat"]]
| Paris | Lyon | Nantes | Pau | |
|---|---|---|---|---|
| singe | 20 | 21 | 22 | 23 |
| chat | 10 | 11 | 12 | 13 |
df.loc["souris", "Pau"]
33
df.loc[["singe", "chat"], ["Lyon", "Nantes"]]
| Lyon | Nantes | |
|---|---|---|
| singe | 21 | 22 |
| chat | 11 | 12 |
# row selection with index
df.iloc[[1,0]]
| Paris | Lyon | Nantes | Pau | |
|---|---|---|---|---|
| singe | 20 | 21 | 22 | 23 |
| chat | 10 | 11 | 12 | 13 |
df.iloc[0:2]
| Paris | Lyon | Nantes | Pau | |
|---|---|---|---|---|
| chat | 10 | 11 | 12 | 13 |
| singe | 20 | 21 | 22 | 23 |
# selection with condition
df[ df["Pau"]>15 ]
| Paris | Lyon | Nantes | Pau | |
|---|---|---|---|---|
| singe | 20 | 21 | 22 | 23 |
| souris | 30 | 31 | 32 | 33 |
df.loc[df["Pau"]>15, ["Lyon"]]
| Lyon | |
|---|---|
| singe | 21 |
| souris | 31 |
df[(df["Pau"]>15) & (df["Lyon"]>25)]
| Paris | Lyon | Nantes | Pau | |
|---|---|---|---|---|
| souris | 30 | 31 | 32 | 33 |
## dataframe combination
data1 = {"Lyon": [10, 23, 17], "Paris": [3, 15, 20]}
df1 = pd.DataFrame.from_dict(data1)
df1.index = ["chat", "singe", "souris"]
df1
| Lyon | Paris | |
|---|---|---|
| chat | 10 | 3 |
| singe | 23 | 15 |
| souris | 17 | 20 |
data2 = {"Nantes": [3, 9, 14], "Strasbourg": [5, 10, 8]}
df2 = pd.DataFrame.from_dict(data2)
df2.index = ["chat", "souris", "lapin"]
df2
| Nantes | Strasbourg | |
|---|---|---|
| chat | 3 | 5 |
| souris | 9 | 10 |
| lapin | 14 | 8 |
df3 = pd.concat([df1, df2])
df3
| Lyon | Paris | Nantes | Strasbourg | |
|---|---|---|---|---|
| chat | 10.0 | 3.0 | NaN | NaN |
| singe | 23.0 | 15.0 | NaN | NaN |
| souris | 17.0 | 20.0 | NaN | NaN |
| chat | NaN | NaN | 3.0 | 5.0 |
| souris | NaN | NaN | 9.0 | 10.0 |
| lapin | NaN | NaN | 14.0 | 8.0 |
pd.concat([df1, df2], axis=1)
| Lyon | Paris | Nantes | Strasbourg | |
|---|---|---|---|---|
| chat | 10.0 | 3.0 | 3.0 | 5.0 |
| singe | 23.0 | 15.0 | NaN | NaN |
| souris | 17.0 | 20.0 | 9.0 | 10.0 |
| lapin | NaN | NaN | 14.0 | 8.0 |
pd.concat([df1, df2], axis=1, join="inner")
| Lyon | Paris | Nantes | Strasbourg | |
|---|---|---|---|---|
| chat | 10 | 3 | 3 | 5 |
| souris | 17 | 20 | 9 | 10 |
import seaborn as sns
data_open = sns.load_dataset("tips") # iris, diamonds, penguins, etc, car_crashes
print(data_open.shape)
data_open.head()
(244, 7)
| total_bill | tip | sex | smoker | day | time | size | |
|---|---|---|---|---|---|---|---|
| 0 | 16.99 | 1.01 | Female | No | Sun | Dinner | 2 |
| 1 | 10.34 | 1.66 | Male | No | Sun | Dinner | 3 |
| 2 | 21.01 | 3.50 | Male | No | Sun | Dinner | 3 |
| 3 | 23.68 | 3.31 | Male | No | Sun | Dinner | 2 |
| 4 | 24.59 | 3.61 | Female | No | Sun | Dinner | 4 |
# see https://pandas.pydata.org/pandas-docs/stable/user_guide/visualization.html, for more examples
data_open.plot()
<AxesSubplot:>
data_open.total_bill.plot(kind="line", figsize=(10, 5))
<AxesSubplot:>
data_open.plot(kind="scatter", x="total_bill", y="tip", figsize=(10, 5))
<AxesSubplot:xlabel='total_bill', ylabel='tip'>
data_open.day.value_counts(normalize=False).plot(kind="pie", figsize=(10, 5), autopct='%1.1f%%');
data_open[["day", "time"]].value_counts(normalize=False)
day time
Sat Dinner 87
Sun Dinner 76
Thur Lunch 61
Fri Dinner 12
Lunch 7
Thur Dinner 1
dtype: int64
(data_open.sex.value_counts(normalize=False)
.plot(kind="pie", figsize=(10, 5), autopct='%1.1f%%',
wedgeprops=dict(width=0.5, edgecolor='w')
)
);
<AxesSubplot:>
sns.relplot(data=data_open, kind="scatter", y="total_bill", x="tip",
hue="time", height=5, aspect=1.5,);
sns.boxplot(data=data_open, x="total_bill", y="day", orient="h")
<AxesSubplot:xlabel='total_bill', ylabel='day'>
sns.catplot(data=data_open, kind="box", x="total_bill", y="day", hue="smoker",);
sns.catplot(data=data_open, kind="violin", x="total_bill", y="sex", hue="smoker",
orient="h", height=5, aspect=1.5, split=True, inner="quart", linewidth=1,
palette={"Yes": "b", "No": ".85"});
data_open.day.value_counts(normalize=False)
Sat 87 Sun 76 Thur 62 Fri 19 Name: day, dtype: int64
sns.catplot(data=data_open, kind="bar", x="day", y="tip", hue="smoker", ci=None,
height=5, aspect=1.5, estimator=np.sum);
sns.relplot(data=data_open, kind="scatter", x="total_bill", col="day", col_wrap=2,
y="tip", height=5, aspect=1.5);
sns.pairplot(data_open, hue="sex", diag_kind="auto", # hist , kdeb
x_vars=None,
y_vars=None,
corner=True, # plot only the lower triangle:
);
g = sns.lmplot(
data=data_open,
x="total_bill", y="tip", hue="time",
height=5
)
# interactive graph
import plotly.express as px
df = px.data.iris()
fig = px.scatter(df, x="sepal_width", y="sepal_length", color="species")
fig.show()
fig = px.scatter(df, x="sepal_width", y="sepal_length",
color="species",
marginal_y="violin",
marginal_x="box", trendline="ols", template="simple_white")
fig.show()
fig = px.bar(data_open, x="sex", y="total_bill", color="smoker", barmode="group")
fig.show()
fig = px.bar(data_open, x="sex", y="total_bill", color="smoker",
barmode="group", facet_row="time", facet_col="day",
category_orders={"day": ["Thur", "Fri", "Sat", "Sun"], "time": ["Lunch", "Dinner"]})
fig.show()
fig = px.histogram(data_open, x="total_bill", y="tip", color="sex",
marginal=None, hover_data=data_open.columns)
fig.show()
fig = px.scatter_matrix(df, dimensions=["sepal_width", "sepal_length", "petal_width", "petal_length"],
color="species")
fig.show()
fig = px.parallel_categories(data_open, color="size", color_continuous_scale=px.colors.sequential.Inferno)
fig.show()
df_country = px.data.gapminder()
print(df_country.shape)
df_country.head()
(1704, 8)
| country | continent | year | lifeExp | pop | gdpPercap | iso_alpha | iso_num | |
|---|---|---|---|---|---|---|---|---|
| 0 | Afghanistan | Asia | 1952 | 28.801 | 8425333 | 779.445314 | AFG | 4 |
| 1 | Afghanistan | Asia | 1957 | 30.332 | 9240934 | 820.853030 | AFG | 4 |
| 2 | Afghanistan | Asia | 1962 | 31.997 | 10267083 | 853.100710 | AFG | 4 |
| 3 | Afghanistan | Asia | 1967 | 34.020 | 11537966 | 836.197138 | AFG | 4 |
| 4 | Afghanistan | Asia | 1972 | 36.088 | 13079460 | 739.981106 | AFG | 4 |
fig = px.scatter(df_country.query("year==2007"), x="gdpPercap", y="lifeExp", size="pop", color="continent",
hover_name="country", log_x=True, size_max=60)
fig.show()
fig = px.area(df_country, x="year", y="pop", color="continent", line_group="country",
title="Evolution of the population")
fig.show()
df_country.country.unique()
array(['Afghanistan', 'Other countries', 'Albania', 'Algeria', 'Angola',
'Argentina', 'Australia', 'Austria', 'Bangladesh', 'Belgium',
'Benin', 'Bolivia', 'Bosnia and Herzegovina', 'Brazil', 'Bulgaria',
'Burkina Faso', 'Burundi', 'Cambodia', 'Cameroon', 'Canada',
'Central African Republic', 'Chad', 'Chile', 'China', 'Colombia',
'Congo, Dem. Rep.', 'Congo, Rep.', 'Costa Rica', "Cote d'Ivoire",
'Croatia', 'Cuba', 'Czech Republic', 'Denmark',
'Dominican Republic', 'Ecuador', 'Egypt', 'El Salvador', 'Eritrea',
'Ethiopia', 'Finland', 'France', 'Germany', 'Ghana', 'Greece',
'Guatemala', 'Guinea', 'Haiti', 'Honduras', 'Hong Kong, China',
'Hungary', 'India', 'Indonesia', 'Iran', 'Iraq', 'Ireland',
'Israel', 'Italy', 'Jamaica', 'Japan', 'Jordan', 'Kenya',
'Korea, Dem. Rep.', 'Korea, Rep.', 'Kuwait', 'Lebanon', 'Lesotho',
'Liberia', 'Libya', 'Madagascar', 'Malawi', 'Malaysia', 'Mali',
'Mauritania', 'Mexico', 'Mongolia', 'Morocco', 'Mozambique',
'Myanmar', 'Namibia', 'Nepal', 'Netherlands', 'New Zealand',
'Nicaragua', 'Niger', 'Nigeria', 'Norway', 'Oman', 'Pakistan',
'Panama', 'Paraguay', 'Peru', 'Philippines', 'Poland', 'Portugal',
'Puerto Rico', 'Romania', 'Rwanda', 'Saudi Arabia', 'Senegal',
'Serbia', 'Sierra Leone', 'Singapore', 'Slovak Republic',
'Slovenia', 'Somalia', 'South Africa', 'Spain', 'Sri Lanka',
'Sudan', 'Sweden', 'Switzerland', 'Syria', 'Taiwan', 'Tanzania',
'Thailand', 'Togo', 'Tunisia', 'Turkey', 'Uganda',
'United Kingdom', 'United States', 'Uruguay', 'Venezuela',
'Vietnam', 'West Bank and Gaza', 'Yemen, Rep.', 'Zambia',
'Zimbabwe'], dtype=object)
fig = px.pie(df_country, values='pop', names='continent', title='Population per continent')
fig.show()
fig = px.sunburst(df_country.query("year == 2007"), path=['continent', 'country'], values='pop',
color='lifeExp', hover_data=['iso_alpha'],)
fig.show()
px.choropleth(df_country,
locations="iso_alpha",
color="lifeExp",
hover_name="country",
animation_frame="year",
color_continuous_scale='Plasma',
height=600, range_color=[20,80],
)
# geopandas, folium, geemap, etc
import folium
url = 'https://raw.githubusercontent.com/python-visualization/folium/master/examples/data'
country_geo = f'{url}/world-countries.json'
Map function:
Chloropleth
m = folium.Map(location=[0, 0], zoom_start=1.)
choropleth = folium.Choropleth(geo_data=country_geo, data=df_country,
columns=['country', 'lifeExp'],
key_on='feature.properties.name', # 'feature.id' for country code (ex ALB, etc),
name="choropleth",
nan_fill_color='white',
fill_color='YlGn',
fill_opacity=0.7,
line_opacity=0.2,
legend_name="Life Exp (%)",
highlight=True,
smooth_factor=0,
).add_to(m)
# add labels indicating the name of the community
style_function = "font-size: 15px; font-weight: bold"
choropleth.geojson.add_child(
folium.features.GeoJsonTooltip(['name'], style=style_function, labels=False))
# create a layer control
folium.LayerControl().add_to(m)
m
df_carshare = px.data.carshare()
print(df_carshare.shape)
df_carshare.head()
(249, 4)
| centroid_lat | centroid_lon | car_hours | peak_hour | |
|---|---|---|---|---|
| 0 | 45.471549 | -73.588684 | 1772.750000 | 2 |
| 1 | 45.543865 | -73.562456 | 986.333333 | 23 |
| 2 | 45.487640 | -73.642767 | 354.750000 | 20 |
| 3 | 45.522870 | -73.595677 | 560.166667 | 23 |
| 4 | 45.453971 | -73.738946 | 2836.666667 | 19 |
df_carshare.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 249 entries, 0 to 248 Data columns (total 4 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 centroid_lat 249 non-null float64 1 centroid_lon 249 non-null float64 2 car_hours 249 non-null float64 3 peak_hour 249 non-null int64 dtypes: float64(3), int64(1) memory usage: 7.9 KB
df_carshare.describe()
| centroid_lat | centroid_lon | car_hours | peak_hour | |
|---|---|---|---|---|
| count | 249.000000 | 249.000000 | 249.000000 | 249.000000 |
| mean | 45.523417 | -73.591834 | 1092.528782 | 8.787149 |
| std | 0.035177 | 0.033098 | 572.187677 | 7.223874 |
| min | 45.448903 | -73.738946 | 33.250000 | 0.000000 |
| 25% | 45.497804 | -73.618625 | 665.583333 | 3.000000 |
| 50% | 45.527905 | -73.587318 | 1020.916667 | 5.000000 |
| 75% | 45.546145 | -73.570955 | 1414.916667 | 15.000000 |
| max | 45.610879 | -73.512460 | 3274.000000 | 23.000000 |
fig = px.scatter_mapbox(df_carshare, lat="centroid_lat", lon="centroid_lon", color="peak_hour", size="car_hours",
color_continuous_scale=px.colors.cyclical.IceFire, size_max=15, zoom=10,
mapbox_style="carto-positron")
fig.show()
## load data
from os import path
ROOT_DIR = path.dirname(path.realpath("__file__"))
print(ROOT_DIR)
data = pd.read_csv(path.join(ROOT_DIR, "data", "data-analysis.csv"))
data.shape
/Users/mouslydiaw/Documents/ensae/ensae_project/courses/python_crash_course
(3712, 9)
data.head()
| Unnamed: 0 | Indicator Code | Topic | Indicator Name | Country Name | Country Code | Region | Year | Value | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 152 | NY.GDP.PCAP.PP.KD | Economic Policy & Debt: Purchasing power parity | GDP per capita, PPP (constant 2011 internation... | Albania | ALB | Europe & Central Asia | 2010 | 9927.119576 |
| 1 | 153 | NY.GDP.PCAP.PP.KD | Economic Policy & Debt: Purchasing power parity | GDP per capita, PPP (constant 2011 internation... | Antigua and Barbuda | ATG | Latin America & Caribbean | 2010 | 19212.720131 |
| 2 | 154 | NY.GDP.PCAP.PP.KD | Economic Policy & Debt: Purchasing power parity | GDP per capita, PPP (constant 2011 internation... | Argentina | ARG | Latin America & Caribbean | 2010 | 18712.063077 |
| 3 | 155 | NY.GDP.PCAP.PP.KD | Economic Policy & Debt: Purchasing power parity | GDP per capita, PPP (constant 2011 internation... | Armenia | ARM | Europe & Central Asia | 2010 | 6702.848006 |
| 4 | 156 | NY.GDP.PCAP.PP.KD | Economic Policy & Debt: Purchasing power parity | GDP per capita, PPP (constant 2011 internation... | Australia | AUS | East Asia & Pacific | 2010 | 41384.923552 |
data.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 3712 entries, 0 to 3711 Data columns (total 9 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Unnamed: 0 3712 non-null int64 1 Indicator Code 3712 non-null object 2 Topic 3712 non-null object 3 Indicator Name 3712 non-null object 4 Country Name 3712 non-null object 5 Country Code 3712 non-null object 6 Region 3712 non-null object 7 Year 3712 non-null int64 8 Value 2333 non-null float64 dtypes: float64(1), int64(2), object(6) memory usage: 261.1+ KB
data2015 = data[data.Year == 2015]
data2015.shape
(464, 9)
1 == 1
[] is []
list() is list()
tuple() is tuple()
57663463467 is 57663463467
a = 2
b = 29
a != b
True
a >= b, a < b
(False, True)
The Zen of Python:
import this
The Zen of Python, by Tim Peters Beautiful is better than ugly. Explicit is better than implicit. Simple is better than complex. Complex is better than complicated. Flat is better than nested. Sparse is better than dense. Readability counts. Special cases aren't special enough to break the rules. Although practicality beats purity. Errors should never pass silently. Unless explicitly silenced. In the face of ambiguity, refuse the temptation to guess. There should be one-- and preferably only one --obvious way to do it. Although that way may not be obvious at first unless you're Dutch. Now is better than never. Although never is often better than *right* now. If the implementation is hard to explain, it's a bad idea. If the implementation is easy to explain, it may be a good idea. Namespaces are one honking great idea -- let's do more of those!
Python evolves. But there are limits:
from __future__ import braces
Perhaps the most well-known statement type is the if statement. For example:
x = int(input("Please enter an integer: "))
if x < 0:
x = 0
print('Negative changed to zero')
elif x == 0:
print('Zero')
elif x == 1:
print('Single')
else:
print('More')
More
There can be zero or more elif parts, and the else part is optional. The keyword ‘elif’ is short for ‘else if’, and is useful to avoid excessive indentation. An if … elif … elif … sequence is a substitute for the switch or case statements found in other languages.
# Measure some strings:
words = ['cat', 'window', 'defenestrate']
for w in words:
print(w, len(w))
cat 3 window 6 defenestrate 12
# Create a sample collection
users = {'Hans': 'active', 'Éléonore': 'inactive', '景太郎': 'active'}
# Strategy: Iterate over a copy
for user, status in users.copy().items():
if status == 'inactive':
del users[user]
print(f"Users: {users}")
# Strategy: Create a new collection
active_users = {}
for user, status in users.items():
if status == 'active':
active_users[user] = status
print(f"Active users: {active_users}")
Users: {'Hans': 'active', '景太郎': 'active'}
Active users: {'Hans': 'active', '景太郎': 'active'}
for i in range(5):
print(i)
0 1 2 3 4
list(range(5, 10))
[5, 6, 7, 8, 9]
list(range(0, 10, 3))
[0, 3, 6, 9]
list(range(-10, -100, -30))
[-10, -40, -70]
a = ['Mary', 'had', 'a', 'little', 'lamb']
for i in range(len(a)):
print(i, a[i])
0 Mary 1 had 2 a 3 little 4 lamb
In most such cases, however, it is convenient to use the enumerate() function Looping Technique
list(enumerate(a))
[(0, 'Mary'), (1, 'had'), (2, 'a'), (3, 'little'), (4, 'lamb')]
for n in range(2, 10):
for x in range(2, n):
if n % x == 0:
print(n, 'equals', x, '*', n//x)
break
else:
# loop fell through without finding a factor
print(n, 'is a prime number')
2 is a prime number 3 is a prime number 4 equals 2 * 2 5 is a prime number 6 equals 2 * 3 7 is a prime number 8 equals 2 * 4 9 equals 3 * 3
# continues with the next iteration of the loop
for num in range(2, 10):
if num % 2 == 0:
print("Found an even number", num)
continue
print("Found an odd number", num)
Found an even number 2 Found an odd number 3 Found an even number 4 Found an odd number 5 Found an even number 6 Found an odd number 7 Found an even number 8 Found an odd number 9
The pass statement does nothing. It can be used when a statement is required syntactically but the program requires no action. For example:
while True:
pass # Busy-wait for keyboard interrupt
--------------------------------------------------------------------------- KeyboardInterrupt Traceback (most recent call last) <ipython-input-128-1d3815b702a0> in <module> 1 while True: ----> 2 pass # Busy-wait for keyboard interrupt (Ctrl+C) KeyboardInterrupt:
Python "warts" are things for which people have criticised Python, typically aspects of the language or mechanisms of its implementation
Is not always clear:
y = 0
for x in range(10):
y = x
x
[x for x in range(10, 20)]
x
Python follows the LEGB Rule (after https://www.amazon.com/dp/0596513984/):
x = 3
def foo():
x=4
def bar():
print(x) # Accesses x from foo's scope
bar() # Prints 4
x=5
bar() # Prints 5
foo()
4 5
See scope_resolution_legb_rule.ipynb for some additional readings on scope.
def function():
for i in range(10):
yield i
function()
<generator object function at 0x7f904618ccf0>
for y in function():
print(y)
0 1 2 3 4 5 6 7 8 9
def do_something(a, b, c):
return (a, b, c)
do_something(1, 2, 3)
(1, 2, 3)
def do_something_else(a=1, b=2, c=3):
return (a, b, c)
do_something_else()
(1, 2, 3)
def some_function(start=[]):
start.append(1)
return start
result = some_function()
result
[1]
result.append(2)
other_result = some_function()
other_result
[1, 2, 1]
"List comprehension" is the idea of writing some code inside of a list that will generate a list.
Consider the following:
[x ** 2 for x in range(10)]
temp_list = []
for x in range(10):
temp_list.append(x ** 2)
temp_list
fruits = ['Banana', 'Apple', 'Lime']
loud_fruits = [fruit.upper() for fruit in fruits]
print(loud_fruits)
# List and the enumerate function
list(enumerate(fruits))
['BANANA', 'APPLE', 'LIME']
[(0, 'Banana'), (1, 'Apple'), (2, 'Lime')]
But list comprehension is much more concise.
%matplotlib notebook
After the magic, we then need to import the matplotlib library:
import matplotlib.pyplot as plt
Python has many, many libraries. We will use a few over the course of the semester.
To create a simple line plot, just give a list of y-values to the function plt.plot().
plt.plot([5, 8, 2, 6, 1, 8, 2, 3, 4, 5, 6])
[<matplotlib.lines.Line2D at 0x7f9044f55580>]
But you should never create a plot that doesn't have labels on the x and y axises, and should always have a title. Read the documentation on matplotlib and add labels and a title to the plot above:
http://matplotlib.org/api/pyplot_api.html
Another commonly used library (especially with matplotlib is numpy). Often imported as:
Are functions that capture some of the local bindings to variables.
def return_a_closure():
dict = {}
def hidden(operator, value, other=None):
if operator == "add":
dict[value] = other
else:
return dict[value]
return hidden
thing = return_a_closure()
thing("add", "apple", 42)
thing("get", "apple")
42
Where is dict?
See http://www.programiz.com/python-programming/closure for more examples.
https://medium.com/cs-code/beginners-guide-to-using-git-8e5001791fa6 https://www.freecodecamp.org/news/learn-the-basics-of-git-in-under-10-minutes-da548267cc91/
https://www.edureka.co/blog/git-commands-with-example/ https://education.github.com/git-cheat-sheet-education.pdf https://git-scm.com/book/fr/v2/Les-bases-de-Git-Les-alias-Git